Goto

Collaborating Authors

 oblique decision tree


Feature Learning for Interpretable, Performant Decision Trees

Neural Information Processing Systems

Decision trees are regarded for high interpretability arising from their hierarchical partitioning structure built on simple decision rules. However, in practice, this is not realized because axis-aligned partitioning of realistic data results in deep trees, and because ensemble methods are used to mitigate overfitting. Even then, model complexity and performance remain sensitive to transformation of the input, and extensive expert crafting of features from the raw data is common. We propose the first system to alternate sparse feature learning with differentiable decision tree construction to produce small, interpretable trees with good performance. It benchmarks favorably against conventional tree-based models and demonstrates several notions of interpretability of a model and its predictions.


Feature Learning for Interpretable, Performant Decision Trees

Neural Information Processing Systems

Points were sampled uniformly in the bands denoted by dashed lines. We posit that these barriers are due, at least in part, to the sensitivity of decision trees to transformations of the input resulting from greedy construction and simple decision rules. Of these, key limitation is the latter; even if we replace greedy construction with a perfect tree learner, simple distributions can nonetheless require an arbitrarily large axis-aligned tree to fit.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper introduces an efficient feature transform of local decorrelation, which when combined with boosted (orthogonal) decision trees, considerably improves over the state-of-the-art on pedestrian detection. Overall, it is a clearly (and nicely) written paper with good analysis, enough details and solid experiments. Pros: - Very well written and executed paper - Attention to detail - Solid results - Straight forward and intuitive method Cons: - Incremental from Hariharan et al. (not major, see later) - If it claims ``Improved Detection'', as opposed to ``Improved Pedestrian Detection'', then I would have liked to see some more results on object detection or likewise. Going from global to local decorrelation, and doing the right analysis for design decisions set it apart.


CART-ELC: Oblique Decision Tree Induction via Exhaustive Search

arXiv.org Artificial Intelligence

Oblique decision trees have attracted attention due to their potential for improved classification performance over traditional axis-aligned decision trees. However, methods that rely on exhaustive search to find oblique splits face computational challenges. As a result, they have not been widely explored. We introduce a novel algorithm, Classification and Regression Tree - Exhaustive Linear Combinations (CART-ELC), for inducing oblique decision trees that performs an exhaustive search on a restricted set of hyperplanes. We then investigate the algorithm's computational complexity and its predictive capabilities. Our results demonstrate that CART-ELC consistently achieves competitive performance on small datasets, often yielding statistically significant improvements in classification accuracy relative to existing decision tree induction algorithms, while frequently producing shallower, simpler, and thus more interpretable trees.


ODTE -- An ensemble of multi-class SVM-based oblique decision trees

arXiv.org Artificial Intelligence

We propose ODTE, a new ensemble that uses oblique decision trees as base classifiers. Additionally, we introduce STree, the base algorithm for growing oblique decision trees, which leverages support vector machines to define hyperplanes within the decision nodes. We embed a multiclass strategy -- one-vs-one or one-vs-rest -- at the decision nodes, allowing the model to directly handle non-binary classification tasks without the need to cluster instances into two groups, as is common in other approaches from the literature. In each decision node, only the best-performing model SVM -- the one that minimizes an impurity measure for the n-ary classification -- is retained, even if the learned SVM addresses a binary classification subtask. An extensive experimental study involving 49 datasets and various state-of-the-art algorithms for oblique decision tree ensembles has been conducted. Our results show that ODTE ranks consistently above its competitors, achieving significant performance gains when hyperparameters are carefully tuned. Moreover, the oblique decision trees learned through STree are more compact than those produced by other algorithms evaluated in our experiments.


Yet Another Representation of Binary Decision Trees: A Mathematical Demonstration

arXiv.org Artificial Intelligence

A decision tree looks like a simple computational graph without cycles, where only the leaf nodes specify the output values and the non-terminals specify their tests or split conditions. From the numerical perspective, we express decision trees in the language of computational graph. We explicitly parameterize the test phase, traversal phase and prediction phase of decision trees based on the bitvectors of non-terminal nodes. As shown later, the decision tree is a shallow binary network in some sense. Especially, we introduce the bitvector matrix to implement the tree traversal in numerical approach, where the core is to convert the logical `AND' operation to arithmetic operations. And we apply this numerical representation to extend and unify diverse decision trees in concept.


Enhancing Group Fairness in Online Settings Using Oblique Decision Forests

arXiv.org Artificial Intelligence

Fairness, especially group fairness, is an important consideration in the context of machine learning systems. The most commonly adopted group fairness-enhancing techniques are in-processing methods that rely on a mixture of a fairness objective (e.g., demographic parity) and a task-specific objective (e.g., cross-entropy) during the training process. However, when data arrives in an online fashion - one instance at a time - optimizing such fairness objectives poses several challenges. In particular, group fairness objectives are defined using expectations of predictions across different demographic groups. In the online setting, where the algorithm has access to a single instance at a time, estimating the group fairness objective requires additional storage and significantly more computation (e.g., forward/backward passes) than the task-specific objective at every time step. In this paper, we propose Aranyani, an ensemble of oblique decision trees, to make fair decisions in online settings. The hierarchical tree structure of Aranyani enables parameter isolation and allows us to efficiently compute the fairness gradients using aggregate statistics of previous decisions, eliminating the need for additional storage and forward/backward passes. We also present an efficient framework to train Aranyani and theoretically analyze several of its properties. We conduct empirical evaluations on 5 publicly available benchmarks (including vision and language datasets) to show that Aranyani achieves a better accuracy-fairness trade-off compared to baseline approaches. Critical applications of machine learning, such as hiring (Dastin, 2022) and criminal recidivism (Larson et al., 2016), require special attention to avoid perpetuating biases present in training data (Corbett-Davies et al., 2017; Buolamwini & Gebru, 2018; Raji & Buolamwini, 2019). Group fairness, which is a well-studied paradigm for mitigating such biases in machine learning (Mehrabi et al., 2021; Hort et al., 2022), tries to achieve statistical parity of a system's predictions among different demographic (or protected) groups (e.g., gender or race). Most of these approaches rely on group fairness objectives that are optimized alongside task-specific objectives in an offline setting (Dwork et al., 2012). Group fairness objectives (e.g., demographic parity) are defined using expectations of predictions across different demographic groups, which requires the system to have access to labeled data from different groups.


Heterogeneous Oblique Double Random Forest

arXiv.org Artificial Intelligence

The decision tree ensembles use a single data feature at each node for splitting the data. However, splitting in this manner may fail to capture the geometric properties of the data. Thus, oblique decision trees generate the oblique hyperplane for splitting the data at each non-leaf node. Oblique decision trees capture the geometric properties of the data and hence, show better generalization. The performance of the oblique decision trees depends on the way oblique hyperplanes are generate and the data used for the generation of those hyperplanes. Recently, multiple classifiers have been used in a heterogeneous random forest (RaF) classifier, however, it fails to generate the trees of proper depth. Moreover, double RaF studies highlighted that larger trees can be generated via bootstrapping the data at each non-leaf node and splitting the original data instead of the bootstrapped data recently. The study of heterogeneous RaF lacks the generation of larger trees while as the double RaF based model fails to take over the geometric characteristics of the data. To address these shortcomings, we propose heterogeneous oblique double RaF. The proposed model employs several linear classifiers at each non-leaf node on the bootstrapped data and splits the original data based on the optimal linear classifier. The optimal hyperplane corresponds to the models based on the optimized impurity criterion. The experimental analysis indicates that the performance of the introduced heterogeneous double random forest is comparatively better than the baseline models. To demonstrate the effectiveness of the proposed heterogeneous double random forest, we used it for the diagnosis of Schizophrenia disease. The proposed model predicted the disease more accurately compared to the baseline models.


Multiple Instance Learning with Trainable Decision Tree Ensembles

arXiv.org Artificial Intelligence

A new random forest based model for solving the Multiple Instance Learning (MIL) problem under small tabular data, called Soft Tree Ensemble MIL (STE-MIL), is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural networks of a specific form, which approximate the tree functions. It is also proposed to aggregate the instance and bag embeddings (output vectors) by using the attention mechanism. The whole STE-MIL model, including soft decision trees, neural networks, the attention mechanism and a classifier, is trained in an end-to-end manner. Numerical experiments with tabular datasets illustrate STE-MIL. The corresponding code implementing the model is publicly available.


Evolutionary learning of interpretable decision trees

arXiv.org Artificial Intelligence

Reinforcement learning techniques achieved human-level performance in several tasks in the last decade. However, in recent years, the need for interpretability emerged: we want to be able to understand how a system works and the reasons behind its decisions. Not only we need interpretability to assess the safety of the produced systems, we also need it to extract knowledge about unknown problems. While some techniques that optimize decision trees for reinforcement learning do exist, they usually employ greedy algorithms or they do not exploit the rewards given by the environment. This means that these techniques may easily get stuck in local optima. In this work, we propose a novel approach to interpretable reinforcement learning that uses decision trees. We present a two-level optimization scheme that combines the advantages of evolutionary algorithms with the advantages of Q-learning. This way we decompose the problem into two sub-problems: the problem of finding a meaningful and useful decomposition of the state space, and the problem of associating an action to each state. We test the proposed method on three well-known reinforcement learning benchmarks, on which it results competitive with respect to the state-of-the-art in both performance and interpretability. Finally, we perform an ablation study that confirms that using the two-level optimization scheme gives a boost in performance in non-trivial environments with respect to a one-layer optimization technique.